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TITLE OF THE INVENTION 

IDENTIFIER GENERATING METHOD, IDENTITY DETERMINING 
METHOD, IDENTIFIER TRANSMITTING METHOD, IDENTIFIER 
GENERATING APPARATUS, IDENTITY DETERMINING APPARATUS, 
5 AND IDENTIFIER TRANSMITTING APPARATUS 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] The present invention relates to a method 

of generating an identical identifier for multiple 

10 document data different in expression but identical in 

meaning, a method of determining identity, using the 
identifier, a method of transmitting the identifier 
thus generated, an identifier generating apparatus, an 
identity determining apparatus, and an identifier 

15 transmitting apparatus. 

Related Background Art 

[0002] The dissemination of XML is directing 

attention to Web services of architecture for 
implementing dynamic connections of various services 

20 present on wide area networks typified by the Internet. 

[0003] In the Web services, the XML techniques are 

considered to be utilized for description of network 
protocols and service interfaces, management of 
contents, etc., but file sizes of XML documents are 

25 much larger than those of existing HTTP messages, 

because the XML documents adopt the description with 
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tags. For this reason, there arises the problem that 
the load is heavier on the networks and the processing 
time at terminals or servers becomes longer. 
Consideration is thus directed toward processing based 
5 on identifiers uniquely generated from XML documents, 

in order to lessen the load on the networks and 
simplify the processing. 

[0004] An identifier generating method is a method 

of regarding an XML document as a sequence of 

10 characters and generating as an identifier a result 

value obtained by a one-way function (e.g., reference 
is made to Japanese Patent Application Laid-Open No. 
2001-282105) . In order to simplify the description of 
XML documents, however, the XML Specification has 

15 flexibility to prevent the XML processing from being 

affected by fluctuation of expression depending upon 
describers of XML documents; for example, any number of 
white spaces may be interposed without any effect, a 
close tag may be omitted, comments can be described, a 

20 document may be described in any desired way as long as 

it is described along definitions of types, and so on. 
The RDF Specification permits constituent elements of 
document data to be described in any order, so that the 
entire document data can be handled in the same 

25 meaning. In the CC/PP Specification defined according 

to the RDF Specification, an URI can be used to specify 
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default data originally defined and a difference 
therefrom is described, thereby enabling omission of 
the description of data except for the difference. In 
the above-described prior art, therefore, even XML 
5 documents or RDF documents with the same original 

meaning can probably be considered to be different 
documents when the documents are analyzed as sequences 
of characters, because of fluctuation of expression or 
the difference of types, the difference of ordering of 

10 constituent elements, the description by default data 

and difference data, and so on. Namely, since 
identifiers are generated using the one-way function or 
the like from XML documents or RDF documents, an 
identical identifier is not always generated for 

15 documents with the same meaning. 

SUMMARY OF THE INVENTION 

[0005] The present invention has been accomplished 

to solve the above problem and an object of the 
invention is to provide an identifier generating method 

20 of generating an identical identifier for XML documents 

or RDF documents being documents with the same original 
meaning, an identity determining method of determining 
identity of multiple document data, using the 
identifier, an identifier transmitting method of 

25 transmitting the identifier, and identifier generating 

apparatus, identity determining apparatus, and 
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identifier transmitting apparatus capable of 
implementing those methods. 

[0006] An identifier generating method (identifier 

generating apparatus) according to the present 
5 invention comprises: a canonicalization process step 

(canonicalization process means) of subjecting document 
data to a canonicalization process to correct 
fluctuation of expression; and an identifier generating 
step (identifier generating means) of generating an 
10 identifier uniquely specifying the document data or 

part thereof, based on all or part of the document data 
having been subjected to the canonicalization process 
in the canonicalization process step. 

[0007] The above identifier generating method 

15 (identifier generating apparatus) may be characterized 

in that the canonicalization process step 
(canonicalization process means) comprises a type 
standardization process step (type standardization 
process means) of, using a class definition file of the 
20 document data describing a definition of a type, 

standardizing a type of expression for a value 
described in the document data, in accordance with the 
type defined by the class definition file. 
[0008] The above identifier generating method 

25 (identifier generating apparatus) may be characterized 

in that the type standardization process step (type 
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standardization process means) is configured to 
standardize an accuracy of numerical data described in 
the document data, in accordance with a definition of a 
type for numerical data described in the class 
5 definition file of the document data. 

[0009] The above identifier generating method 

(identifier generating apparatus) may be characterized 
in that the canonicalizat ion process step 
(canonicalization process means) comprises a document 
10 data generating step (document data generating means) 

of transforming first partial data and second partial 
data into document data in accordance with a 
predetermined transformation rule. 

[0010] An identity determining method (identity 

15 determining apparatus) according to the present 

invention comprises a canonicalization process step 
(canonicalization process means) of subjecting document 
data to a canonicalization process to correct 
fluctuation of expression; an identifier generating 
20 step (identifier generating means) of generating an 

identifier uniquely specifying the document data or 
part thereof, based on all or part of the document data 
having been subjected to the canonicalization process 
in the canonicalization process step (by the 
25 canonicalization process means) ; and an identity 

determining step (identity determining means) of 
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determining whether there is a common portion between 
one document data and another document data, based on 
the identifier having been generated in the identifier 
generating step (by the identifier generating means) . 
5 [0011] The above identity determining method 

(identity determining apparatus) may be characterized 
in that the canonicalization process step 
(canonicalization process means) comprises a type 
standardization process step (type standardization 

10 process means) of, using a class definition file of the 

document data describing a definition of a type, 
standardizing a type of expression for a value 
described in the document data, in accordance with the 
type defined by the class definition file. 

15 [0012] The above identity determining method 

(identity determining apparatus) may be characterized 
in that the type standardization process step (or 
means) is configured to standardize an accuracy of 
numerical data described in the document data, in 

20 accordance with a definition of a type for numerical 

data described in the class definition file of the 
document data. 

[0013] The above identity determining method 

(identity determining apparatus) may be characterized 
25 in that the canonicalization process step 

(canonicalization process means) comprises a document 
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data generating step (document data generating means) 
of transforming first partial data and second partial 
data into document data in accordance with a 
predetermined transformation rule. 
5 [0014] The above identity determining method may 

be characterized in that it further comprises an 
identifier storing step of preliminarily storing the 
identifier having been generated in the identifier 
generating step, into a cache in correlation with the 

10 document data or a result of a predetermined process on 

the document data, and in that the identity determining 
step is configured to: perform a search inside the 
cache on the basis of the identifier of the document 
data as a target for a determination on identity, which 

15 has been generated in the identifier generating step, 

determine that there exists identical document data, if 
the same identifier as the aforementioned identifier is 
present, and determine that there exists no identical 
document data, if the same identifier as the 

20 aforementioned identifier is absent. 

[0015] The above identity determining apparatus 

may be characterized in that it further comprises a 
cache preliminarily storing the identifier having been 
generated by the identifier generating means, in 

25 correlation with the document data or a result of a 

predetermined process on the document data, and in that 
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the identity determining means is configured to: 
perform a search inside the cache on the basis of the 
identifier of the document data as a target for a 
determination on identity, which has been generated by 
5 the identifier generating means, determine that there 

exists identical document data, if the same identifier 
as the aforementioned identifier is present, and 
determine that there exists no identical document data, 
if the same identifier as the aforementioned identifier 

10 is absent. 

[0016] The above identity determining method may 

be characterized in that it further comprises a second 
identifier generating step of generating an identifier 
uniquely specifying the document data or part thereof, 

15 based on all or part of the document data, prior to the 

canonicalization process step, and in that identity 
between one document data and another document data is 
determined on the basis of the identifier having been 
generated in the second identifier generating step, 

20 processing is terminated without execution of the next 

process step if the two document data are determined to 
be identical, and processing is transferred to the 
canonicalization process step if they are determined 
not to be identical. 

25 [0017] The above identity determining apparatus 

may be characterized in that it further comprises 
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second identifier generating means for generating an 
identifier uniquely specifying the document data or 
part thereof, based on all or part of the document 
data, prior to execution of the canonicalization 
5 process by the canonicalization process means, and in 

that identity between one document data and another 
document data is determined on the basis of the 
identifier having been generated by the second 
identifier generating means, and if they are determined 
10 not to be identical, the canonicalization process means 

performs the canonicalization process of the document 
data . 

[0018] Another identity determining method 

(identity determining apparatus) according to the 

15 present invention comprises an identifier generating 

step (identifier generating means) of, based on all or 
part of encoded data of document data, generating an 
identifier uniquely specifying the document data or 
part thereof; and an identity determining step 

20 (identity determining means) of determining whether 

there exists a common portion between one document data 
and another document data, based on the identifier 
having been generated in the identifier generating step 
(by the identifier generating means) . 

25 [0019] The above identity determining method 

(identity determining apparatus) may be characterized 
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in that an instruction to skip a process of decoding 
the encoded data of the document data is issued when 
the identity determining step (identity determining 
means) results in determining that the two document 
5 data are identical. 

[0020] An identifier transmitting method 

(identifier transmitting apparatus) according to the 
present invention comprises a canonicalization process 
step (canonicalization process means) of subjecting 

10 document data to a canonicalization process to correct 

fluctuation of expression; an identifier generating 
step (identifier generating means) of generating an 
identifier uniquely specifying the document data or 
part thereof, based on all or part of the document data 

15 having been subjected to the canonicalization process 

in the canonicalization process step (by the 
canonicalization process means) ; and an identifier 
transmitting step (identifier transmitting means) of 
transmitting the identifier having been generated in 

20 the identifier generating step (by the identifier 

generating means) . 

[0021] The above identifier transmitting method 

(identifier transmitting apparatus) may be 

characterized in that the identifier transmitting step 
25 (identifier transmitting means) comprises a transmitted 

data generating step (transmitted data generating 
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means) of generating data obtained by replacing all or 
part of the document data by the identifier* 
[0022] The above identifier transmitting method 

(identifier transmitting apparatus) may be 

5 characterized in that the transmitted data generating 

step (transmitted data generating means) is configured 
to generate transmitted data described by an identifier 
uniquely specifying partial data included in the 
document data, and difference data between the partial 
10 data and the document data. 

[0023] (Action) 

[0024] In order to solve the aforementioned 

problem, the present invention involves execution of 
the canonicalization process to correct the fluctuation 

15 of expression, prior to generation of the identifier 

for an XML document or RDF document. The 
canonicalization process typified by XML- 

Canonicalization is a process of correcting the 
fluctuation of expression permitted by the XML 

20 Specification, including deletion of redundant white 

spaces, recovery from omission of a close tag, and so 
on. The canonicalization process results in 

canonicalizing XML documents or RDF documents with the 
same meaning into documents described in the same 

25 expression and thus permits an identical identifier to 

be generated for the documents, by a function of 
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generating an identifier from a sequence of characters, 
typified by the one-way function, 

[0025] Here, with reference to the class 

definition file of XML documents or RDF documents, 
5 types of data described in the XML documents or RDF 

documents may be standardized. The type 

standardization process standardizes accuracies or the 
like of Double type or Float type numerals and also 
converts the XML documents or RDF documents with the 

10 same meaning into documents described in the same 

expression, similarly as in the canonicalization 
process. It also permits an identical identifier to be 
generated for the documents, by a function of 
generating an identifier from a sequence of characters, 

15 typified by the one-way function. 

[0026] The present invention may also be 

implemented as follows: for describing document data 
from default data and difference data according to the 
CC/PP Specification, reference is made to multiple 

20 partial data to acquire partial data for document data 

expressing the meaning of the original document data, 
the partial data is transformed according to a certain 
transformation rule to generate original document data, 
and then an identifier is generated therefor. Since 

25 the original document data is generated prior to the 

generation of the identifier, XML documents or RDF 
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documents with the same meaning are changed to 
documents in the same expression, and an identical 
identifier can be generated for the documents, by a 
function of generating an identifier from a sequence of 
5 characters, typified by the one-way function. 

[0027] Here, the identifier may be generated after 

execution of a process of rearranging the sequence of 
constituent elements of document data in accordance 
with a predetermined rule. 
10 [0028] When the present invention involves either 

of the above processes prior to the generation of the 
identifier, the identical identifier may be generated 
for the XML documents or RDF documents with the same 
meaning . 

15 [0029] The present invention also permits identity 

of multiple XML documents or RDF documents to be 
determined using the identifier generated by the above 
generating technique. In the identity determining 
method (identity determining apparatus) according to 

20 the present invention, the identity determining step 

(identity determining means) is configured to determine 
whether there exists a common portion between one 
document data and another document data. Namely, a 
determination can be made on the following cases: 1) 

25 whether part of one document data is identical with 

part of another document data; 2) whether one document 
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data is identical with part of another document data; 
3) whether part of one document data is identical with 
another document data; 4) whether one document data is 
identical with another document data. 
5 [0030] Here, the canonical! zation process and the 

type standardization process may be arranged in a step- 
by-step manner, and after execution of each process, 
the identifier is generated and identity is determined 
based thereon. When each identity determining process 

10 results in determining that documents in question are 

identical, processing may be directly terminated 
without transfer to the next stage, so as to decrease 
the processing time for the identity determination. 
[0031] In the present invention, the identity of 

15 an XML document can be determined using the identifier 

uniquely generated from all or part of encoded data of 
the XML document. Since the encoding of the XML 
document results in assigning expressions with the same 
meaning, a code preliminarily uniquely defined 

20 according to a code transformation rule (reference 

should be made to ISO/IEC 15938 Part 1 Systems Binary 
format-BiM) , the encoded data is in a state in which 
the fluctuation of expression is corrected. Namely, 
XML documents with the same meaning are encoded into 

25 identical encoded data, and the identifier is generated 

from the encoded data as a sequence of characters by 
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the one-way function or the like, whereby the identity 
determination can be made on the XML documents with the 
same meaning. 

[0032] The present invention enables multiple XML 

5 documents, RDF documents, or portions thereof with the 

same meaning to be identified, and thus simplifies 
processing of XML documents, RDF documents, or portions 
thereof having been processed in the past, so as to 
reduce the processing time at terminals or servers. 

10 The present invention permits the identifier to be 

uniquely generated for an XML document, an RDF 
document, or a portion thereof, so that it can be 
utilized as compression of data of documents. 
Furthermore, similar to the identification, it can also 

15 simplify the processing of XML documents or RDF 

documents at terminals or servers. 

[0033] The present invention will be more fully 

understood from the detailed description given 
hereinbelow and the accompanying drawings, which are 

20 given by way of illustration only and are not to be 

considered as limiting the present invention. 
[0034] Further scope of applicability of the 

present invention will become apparent from the 
detailed description given hereinafter. However, it 

25 should be understood that the detailed description and 

specific examples, while indicating preferred 
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embodiments of the invention, are given by way of 
illustration only, since various changes and 
modifications within the spirit and scope of the 
invention will be apparent to those skilled in the art 
5 from this detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0035] Fig. 1 is a flowchart showing the operation 

of the identifier generating method according to an 
embodiment . 

10 [0036] Fig. 2 is a flowchart showing the operation 

of the identifier generating method according to 
another embodiment . 

[0037] Fig. 3 is a flowchart showing the details 

of the operation of type standardization process step 
15 S201. 

[0038] Fig. 4 is a flowchart showing the operation 

of the identifier generating method according to 
another embodiment . 

[0039] Fig. 5 is a flowchart showing the operation 

20 of the identity determining method according to another 

embodiment . 

[0040] Fig. 6 is a flowchart showing the operation 

of the identity determining method according to another 
embodiment . 

25 [0041] Fig. 7 is a flowchart showing the operation 

of the identity determining method according to another 



16 



FP03-0240-00 



embodiment . 

[0042] Fig. 8 is a flowchart showing the operation 

of the identity determining method according to another 
embodiment * 

5 [0043] Fig. 9 is a flowchart showing the operation 

of the identity determining method according to another 
embodiment . 

[0044] Fig. 10 is a block diagram showing the 

configuration of the identity determining apparatus 

10 according to another embodiment. 

[0045] Fig. 11 is a diagram showing a case of an 

XML document as an example of input document data. 
[0046] Fig. 12 is a diagram showing an example of 

data after the canonicalization of document data shown 

15 in Fig. 11, according to XML-Canonicalizat ion 

Specification . 

[0047] Fig. 13 is a diagram showing an example of 

document data identifiers and document data URIs stored 
in the cache. 

20 [0048] Figs 14A to 14C are diagrams showing (A) 

document data 1 before the transformation of target 
document data, (B) document data 2 before the 
transformation of target document data, and (C) an 
example of the class definition file. 

25 [0049] Figs. 15A and 15B are diagrams showing 

examples of document data after the transformation of 
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target document data. 

[0050] Fig. 16 is a block diagram showing the 

configuration of the item rewriting process apparatus. 
[0051] Fig. 17 is a flowchart showing the 

5 operation of the identifier transmitting method 

according to another embodiment. 

[0052] Figs. 18A and 18B are diagrams showing (A) 

an example of default data, and (B) an example of an 
RDF document as target document data. 
10 [0053] Fig. 19 is a diagram showing document data 

after the transformation in document data generating 
step S401. 

[0054] Fig. 20 is a block diagram showing the 

configuration of the identifier transmitting apparatus 
15 according to another embodiment. 

[0055] Fig. 21 is a diagram showing an example of 

document data consisting of a plurality of constituent 
elements . 

[0056] Fig. 22 is a diagram showing an example of 

20 document data transmitted by the identifier 

transmitting method according to another embodiment. 
[0057] Fig. 23 is a diagram showing an example of 

the service system using the partly identifier- 
integrated document data transmitted by the identifier 
25 transmitting method according to another embodiment. 

[0058] Fig. 24 is a flowchart showing the 
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operation of the step-by-step identity determining 
method according to another embodiment. 

[0059] Fig. 25 is a diagram for explaining encoded 

data . 

[0060] Fig. 26 is a block diagram showing the 

configuration of the document data storing-acquiring 
system. 

[0061] Fig. 27 is a diagram showing document data 

after rearrangement of the sequence of constituent 
elements in the alphabetical order. 
DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0062] [First Embodiment] 

[0063] The first embodiment of the present 

invention will be described below with reference to the 
drawings. Fig. 1 is a flowchart showing the operation 
of the identifier generating method according to the 
present embodiment . 

[0064] As shown in the figure, the identifier 

generating method is comprised of target document data 
acquiring step S101 of acquiring document data being a 
target for generation of an identifier; 

canonicalization process step S102 of correcting 
fluctuation of expression of the target document data; 
and identifier generating step S103 of generating a 
unique identifier from the entire target document data 
or a selected range thereof. 
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[0065] Fig. 11 is a diagram showing a case of an 

XML document as an example of input document data. As 
shown in the figure, the input document data includes 
unnecessary white spaces and omission of a close tag. 
5 Canonicalization needs to be carried out prior to 

generation of the identifier, in order to prevent a 
different identifier from being generated because of 
the fluctuation of expression due to the describer. 
[0066] Fig. 12 is an example of canonicalized data 

10 according to the XML-Canonicalization Specification 

from the document data shown in Fig. 11. As shown in 
the figure, the document data after the 
canonicalization process is free of the fluctuation of 
expression due to the describer, as a result of 

15 deletion of unnecessary white spaces and insertion of a 

close tag. An identifier is generated based on the 
document data shown in the figure. 

[0067] In the identifier generating step S103, a 

unique identifier is generated from the entire document 

20 data or a selected range thereof after the 

canonicalization process. For example, using a one-way 
function such as a hash function, the hash value is 
generated as an identifier. However, the function for 
generation of the identifier does not always have to be 

25 the one-way function, but may be any function that can 

generate a unique identifier. 
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[0068] Fig. 2 is a flowchart showing the operation 

of the identifier generating method in which the 
canonicalization process step S102 includes additional 
type standardization process step S201 of confirming a 
5 class definition file of the target document data and 

standardizing the type thereof, 

[0069] Fig. 3 is a flowchart showing the details 

of the operation of type standardization process step 
S201. As shown in the figure, the type standardization 

10 process is comprised of step S301 of acquiring a class 

definition file of the target document data; data type 
confirming step S302 of confirming a type of every data 
described in the target document data, based on the 
class definition file; and document data transforming 

15 step S303 of transforming the target document data 

according to the class definition file. 

[0070] The document data transforming step S303 is 

configured to transform the data according to the data 
type described in the class definition file, which was 

20 confirmed in the data type confirming step S302. Fig. 

14A and Fig. 14B are diagrams showing document data 
before the transformation of target document data, and 
Fig. 14C is a diagram showing an example of the class 
definition file. As shown in the figure, the class 

25 definition file includes the description of 

<ElementType name="value" dt : type="double f, />, which 
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confirms that the type of element value of "value" is 
the "double" type. It is also seen that Document Data 
1 shown in Fig. 14A and Document Data 2 shown in Fig. 
14B include the description of 12.0 and 12.00, 
5 respectively, of the double type, as element values of 

"value." Although the both documents are described 
along the type definition, they are different in the 
form of sequences of characters. 

[0071] Fig. 15A and Fig. 15B are examples of 

10 document data after the transformation of the target 

document data. As shown in these figures, the accuracy 
of the element values of "value" defined in the 
"double" type is made equivalent to the accuracy of 
"double," whereby the element values, 12.0 and 12.00, 
15 of "value" in Document Data 1 and Document Data 2 shown 

in Figs. 14A and 14B become equal to each other. This 
process standardizes the type and equalizes the 
documents in the form of sequences of characters as 
well . 

20 [0072] Fig. 4 is a flowchart showing the operation 

of the identifier generating method in which, where the 
target document data is composed of default data and 
difference data, canonicalization process step S102 
includes additional document data generating step S4 01 

25 of generating the original document data from the 

default data and difference data. 



22 



FP03-0240-00 



[0073] Fig. 18A is a diagram showing an example of 

the default data. As shown in the figure, the 
following is defined as default (see lines 10 and 11). 
<up : role>guest</up : role> 
5 <up : age>l 6<up : age> 

[0074] Fig. 18B is a diagram showing an example of 

an RDF document as target document data. As shown in 
the figure, the RDF document is doubly defined and 
includes the description of an URI of default data 

10 (<ccpp : defaults 

rdf : re sour ce= "User Prof ileDe fault "></ccpp : def aults>) and 
the difference data (<up : role>vip<up : role>) (see lines 
12 and 13) . This RDF document is transformed into the 
original document data in the document data generating 

15 step S401. Fig. 19 is document data after the 

transformation in the document data generating step 
S401. The figure shows the case where the difference 
data was overwritten over the default data. As shown 
in the figure, the default data is acquired from the 

20 URI of the default data, and the difference data is 

overwritten to transform <up : role>guest</up : role> to 
<up : role>vip</up : role> (see line 12), thereby obtaining 
the original document data. 

[0075] Fig. 26 is a block diagram showing a 

25 configuration of document data storing-acquiring system 

2700 capable of storing target document data along with 
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the identifier thereof into a cache by the identifier 
generating method of the present embodiment and 
acquiring the target document data later on from the 
cache, using the identifier. As shown in the figure, 
5 the document data storing-acquiring system 2700 is 

comprised of target document data acquiring part 2701 
for acquiring target document data; identifier 
generating part 2702 with the identifier generating 
method of the present embodiment being mounted thereon; 

10 identifier storing part 2703 for storing the generated 

identifier with the target document data into cache 
2704; identifier acquiring part 2705 for acquiring an 
identifier of document data desired to be acquired from 
the cache 2704; and document data acquiring part 2706 

15 for acquiring the document data from the cache, using 

the identifier. 

[0076] Fig. 13 shows an example of document data 

identifiers and document data URIs stored into cache 
2704 by identifier storing part 2703. As shown in the 
20 figure, it becomes feasible to manage the input 

document data, using their identifiers. If an 

identifier is one already stored in the cache, it is 
also possible to discard the document data without 
storing it. 

25 [0077] The canonicalization process step S102 

shown in Figs. 1, 2, and 4 may also be modified so as 
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to change the sequence of constituent elements in the 
document data according to a predetermined rule. For 
example, supposing the input data is document data 
consisting of multiple constituent elements as shown in 
5 Fig. 21, the sequence of constituent elements is 

rearranged so as to arrange portions of xxxx in 
<rdf : Description rdf : about="xxxx"> in the alphabetical 
order. Fig. 27 shows document data after the 

rearrangement of the sequence of constituent elements 

10 in the alphabetical order. According to the RDF 

Specification, the document data can be assumed to have 
the same meaning in total, regardless of the sequence 
of the constituent elements in the document data. 
Namely, even documents identical in meaning can be 

15 those different in the sequence of constituent 

elements. Therefore, by changing the sequence of 
constituent elements according to the predetermined 
rule and thereafter generating the identifier, it 
becomes feasible to generate an identical identifier 

20 for multiple document data different in the sequence of 

constituent elements but identical in meaning. 
[0078] [Second Embodiment] 

[0079] The second embodiment of the present 

invention will be described below with reference to the 
25 drawings. Fig. 5 is a flowchart showing the operation 

of the identity determining method according to the 
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present embodiment . 

[0080] As shown in the figure, the identity 

determining method is comprised of target document data 
acquiring step S501 of acquiring document data as a 
5 target for a determination on identity; 

canonicalization process step S502 of correcting 
fluctuation of expression for the target document data; 
identifier generating step S503 of generating a unique 
identifier from the entire target document data or a 
10 selected range thereof; and identity determining step 

S504 of determining identity of multiple document data 
on the basis of the identifier generated in the 
identifier generating step S503. 

[0081] Fig. 11 and Fig. 12 show the examples of 

15 the document data before the canonicalization process 

and the document data after the canonicalization 
process. It is seen from these figures that the 
canonicalization process corrects the fluctuation of 
expression . 

20 [0082] Fig. 6 is a flowchart showing the operation 

of the identity determining method in which the 
canonicalization process step S502 includes type 
standardization process step S601 of confirming the 
class definition file of the target document data and 

25 standardizing the type thereof. 

[0083] Fig. 3 is a flowchart showing the details 
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of the operation of type standardization process step 
S601. The details of the operation are the same as in 
the first embodiment . 

[0084] Figs. 14A and 14B are diagrams showing 

5 Document Data 1 and Document Data 2, respectively, 

before the type standardization process, Fig, 14C a 
diagram showing an example of the class definition 
file, and Figs. 15A and 15B diagrams showing examples 
of Document Data 1 and Document Data 2, respectively, 

10 after the type standardization process. It is seen 

from these figures that the values in "value" expressed 
in different expressions are changed into one in the 
same expression by the type standardization step S601« 
[0085] Fig. 7 is a flowchart showing the operation 

15 of the identity determining method in which, where the 

target document data consists of default data and 
difference data, the canonicalization process step S502 
includes additional document data generating step S701 
of generating the original document data from the 

20 default data and the difference data. 

[0086] Figs. 18A and 18B show the examples of the 

default data and target document data, and Fig. 19 the 
example of the document data after the transformation, 
which was generated in the document data generating 

25 step S701. The details of the operation are the same 

as in the first embodiment. 
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[0087] Fig. 8 is a flowchart showing the details 

of the operation of identity determining step S504 . As 
shown in the figure, the identity determining step S504 
is comprised of identifier acquiring step S801 of 
5 acquiring an identifier generated from target document 

data as a target for a determination on identity; 
identifier storing step S803 of storing the identifier 
into a cache; and identity determining step S802 of 
searching the cache for the identifier, determining 

10 with a success in the search that the same document 

data is present, and determining with a failure in the 
search that the same document data is absent. When the 
identifier is absent in the cache, the identifier is 
transferred to the identifier storing step S803 to be 

15 stored into the cache. 

[0088] Fig. 10 is a block diagram showing a 

configuration of identity determining apparatus 1101 
for determining whether input document data is one 
already processed, by the identity determining method 

20 according to the present embodiment. As shown in the 

figure, the identity determining apparatus 1101 is 
comprised of target document data acquiring part 1102 
for acquiring target document data; identifier 
generating part 1103 for generating an identifier from 

25 the target document data; identity determining part 

1108 for determining whether the data is one already 
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processed, using the generated identifier and cache 
1110; and identifier storing part 1109 for, when the 
identity determining part determines that the data is 
not one already processed, storing the identifier into 
5 the cache 1110, for the next identity determination. 

The identifier generating part 1103 is comprised of 
document data generating part 1104 for generating the 
original document data from the default data and 
difference data; canonicalization process part 1105 for 

10 carrying out the canonicalization process to correct 

the fluctuation of expression; type standardization 
process part 1106 for carrying out the type 
standardization process of data, using the class 
definition file; and identifier generation process part 

15 1107 for generating a unique identifier from the target 

document data. 

[0089] Fig. 16 is a block diagram showing a 

configuration of an item rewriting process apparatus 
for skipping an item rewriting process of document data 

20 if the input data is document data already processed, 

using the identity determining method according to the 
present embodiment. As shown in the figure, the 
apparatus is composed of identity determining part 1701 
with the identity determining method of the present 

25 embodiment being mounted thereon; item rewriting part 

1702 for rewriting items of document data in accordance 
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with an item rewriting rule document; and transformed 
document data cache 1703 storing transformed document 
data generated by the item rewriting part 1702, along 
with the identifiers of the document data generated by 
5 the identity determining part 1701. 

[0090] The identity determining part 1701 

generates an identifier after receiving input document 
data. It determines whether the input data is document 
data already having been subjected to item rewriting, 

10 using the generated identifier and transformed document 

data cache 1703. When the input data is document data 
already having been subjected to item rewriting, the 
process at the item rewriting part 1702 is skipped and 
output data is transformed document data that can be 

15 acquired using the identifier present in the 

transformed document data cache. Since the present 
invention enables the skipping of the item rewriting 
process, which generally takes a long processing time, 
it becomes feasible to implement fast processing. 

20 [0091] [Third Embodiment] 

[0092] The third embodiment of the present 

invention will be described below with reference to the 
drawings. Fig. 9 is a flowchart showing the operation 
of the identity determining method according to the 

25 present embodiment. 

[0093] As shown in the figure, the identity 
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determining method is comprised of encoded data 
acquiring step S901 of acquiring encoded data of 
document data as a target for a determination on 
identity; identifier generating step S902 of generating 
5 an identifier from all or part of the acquired encoded 

data; and identity determining step S903 of determining 
identity of multiple document data on the basis of the 
identifier generated in the identifier generating step 
S902 . 

10 [0094] Since the XML encoding assigns expressions 

with the same meaning a code preliminarily uniquely 
defined according to the code transformation rule, the 
encoded data is in a state in which the fluctuation of 
expression is corrected* Namely, identical encoded 

15 data is generated from XML documents with the same 

meaning; therefore, by generating the identifier 
according to the one-way function or the like from the 
encoded data as a sequence of characters, it becomes 
feasible to make a determination on identity of XML 

20 documents with the same meaning. 

[0095] Fig. 25 is a diagram showing an example in 

which Document Data 1 and Document Data 2, which are 
documents identical in meaning but different in the 
sequence of characters, are encoded according to a byte 

25 code table to generate identical encoded data. As 

shown in the figure, the identical encoded data is 
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obtained by encoding Document Data 1 and Document Data 
2 with the same meaning, which are different only in 
insertion of line feeds. When an identifier is 
generated based on this encoded data, the same 
5 identifier can be generated for the multiple document 

data with the same meaning. 

[0096] In Fig. 16, the input document data is 

encoded data resulting from data compression of an XML 

document. When encoded data is supplied, a decoding 

10 process is generally essential to processing thereof. 

However, the use of the identity determining method 

according to the present embodiment permits the 

« 

identity determination in a state of the encoded data 
and thus also permits the skipping of the decoding 
- 15 process for the document data already processed. 

[0097] [Fourth Embodiment] 

[0098] The fourth embodiment of the present 

invention will be described below with reference to the 
drawings. Fig. 17 is a flowchart showing the operation 
20 of the identifier transmitting method according to the 

present embodiment . 

[0099] As shown in the figure, the identifier 

transmitting method is comprised of target document 
data acquiring step S1801 of acquiring document data as 
25 a target; identifier generating step S1802 of carrying 

out the canonicalization process and the type 
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standardization process for the target document data 
and generating an identifier from all or part of the 
document ,data; identifier replacement process step 
S1803 of replacing all or part of the document data 
5 with the generated identifier; difference data adding 

step S1804 of defining the document data or part 
thereof replaced with the identifier, as default, and 
adding difference data therefrom to the document data; 
and transmitting step S1805 of transmitting the 

10 document data generated trough the above processes. 

[0100] Fig* 20 is a block diagram showing a 

configuration of identifier transmitting apparatus 2101 
for transmitting identifier-integrated document data, 
using the identifier transmitting method according to 

15 the present embodiment. As shown in the figure, the 

identifier transmitting apparatus 2101 is comprised of 
target document data acquiring part 1102 for acquiring 
target document data; identifier generating part 1103 
fox carrying out the canonicali zation process and the 

20 type standardization process and generating an 

identifier; and identifier transmitting part 2102 for 
performing an identifier adding process for the target 
document and transmitting the document data. 
[0101] The identifier transmitting part 2102 is 

25 comprised of identifier replacement processing part 

2103 for replacing all or part of the document data 



33 



FP03-0240-00 



with an identifier generated by the identifier 
generating part 1103; difference data adding part 2104 
for defining the document data or part thereof replaced 
with the identifier, as default, and adding difference 
5 data therefrom to the document data; and transmitting 

part 2105 for transmitting the document data generated 
through the above processes. 

[0102] Fig. 21 shows an example of target document 

data. As shown in the figure, three constituent 

10 elements of <ccpp : component> . . . </ccpp : component> are 

described in the target document data. When the target 
document data is processed in the identifier generating 
step S1802, identifiers are generated for the 
respective constituent elements. 

15 [0103] Fig. 22 shows an example of partly 

identifier-integrated document data in which partial 
descriptions (constituent element 1 and constituent 
element 2) in the document data shown in Fig. 21 are 
replaced by their respective identifiers and in which 

20 constituent element 3 is described as additional data, 

as an example of document data transmitted by the 
identifier transmitting method according to the present 
embodiment. As seen from the figure, the replacement 
of constituent element 1 and constituent element 2 with 

25 the identifiers can decrease the data volume of the 

entire document data. 
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[0104] Fig. 23 is a diagram showing an example of 

a service system using the partly identifier-integrated 
document data transmitted by the identifier 
transmitting method according to the present 
5 embodiment. As shown in the figure, the service system 

is composed of terminal 2400 for transmitting the 
partly identif ier-integrated document data transmitted 
by the identifier transmitting method of the present 
embodiment; proxy 2401 for receiving the partly 

10 identifier-integrated document data, performing an 

expansion to recover the original constituent elements 
from the partial identifiers to generate the original 
document data, and transmitting it; and server 2402 for 
receiving the document data and providing a service. 

15 The proxy 2401 is connected to database 2403 in which 

partial identifiers and original constituent elements 
are stored in correlation with each other. The 
terminal 2400 is allowed to transmit data after 
replacing each constituent element already having been 

20 transmitted through the proxy, in the document data 

originally to be transmitted to the server, with an 
identifier, and it thus becomes feasible to use the 
method of the present embodiment as a document data 
transmitting method with a lighter load on the network. 

25 [0105] [Fifth Embodiment] 

[0106] The fifth embodiment of the present 
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invention will be described below with reference to the 
drawings. Fig. 24 is a flowchart showing the operation 
of the step-by-step identity determining method 
according to the present embodiment. 
5 [0107] As shown in the figure, the step-by-step 

identity determining method is comprised of first 
determination step S2501 of generating an identifier 
directly from input document data and making a 
determination on identity; second determination step 

10 S2502 of performing the canonicalization process, then 

generating an identifier, and thereafter making a 
determination on identity; third determination step 
S2503 of performing the type standardization process 
with the use of the class definition file, then 

15 generating an identifier, and making a determination on 

identity; and result output step S2504 of outputting 
the result of the determination. 

[0108] When no identity is recognized in the first 

determination step S2501, the processing is transferred 

20 to the second determination step S2502. When no 

identity is recognized in the second determination step 
S2502, either, as in the first determination step 
S2501, the processing is transferred to the third 
determination step S2503. When identity is recognized 

25 in either of the first determination step S2501 and the 

second determination step S2502, the processes in and 
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after the next step are skipped and the result of the 
determination is outputted in the result output step 
S2504. The third determination step is to transfer the 
result of the determination on identity to the result 
5 output step S2504, and then the result of the 

determination is outputted* 

[0109] The present invention enables an identical 

identifier to be generated for multiple document data 
with the same meaning, or for portions thereof, and 
10 thus enables the identification of multiple document 

data with the same meaning. 

[0110] The present invention also achieves 

reduction of processing time while permitting the 
skipping of the process by the identity determining 

15 method in the case where input data is document data 

already processed in the past, at terminals or servers. 
[0111] The present invention also permits identity 

to be determined in the encoded data state of document 
data, and thus' enables the skipping of processing also 

20 including the decoding process at terminals or servers, 

thus decreasing the processing time. 

[0112] Since the present invention also enables 

document data to be generated by replacing all or part 
of document data with an identifier generated from all 
25 or part of document data with the same meaning, it 

becomes feasible to reduce the data volume of the 
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document data and to decrease the load on the network, 
for example, by transmitting the document data while 
replacing an already-transmitted portion with an 
identi f ier . 

5 [0113] From the invention thus described, it will 

be obvious that the invention may be varied in many 
ways* Such variations are not to be regarded as a 
departure from the spirit and scope of the invention, 
and all such modifications as would be obvious to one 
10 skilled in the art are intended for inclusion within 

the scope of the following claims. 
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